Neural representations for modeling variation in speech

نویسندگان

چکیده

• Neural acoustic models can be used to automatically model pronunciation variation. Pronunciation variation is best captured by intermediate layers of transformer models. Transformer-based embeddings capture details not expressed phonetic transcriptions. Variation in speech often quantified comparing transcriptions the same utterance. However, manually transcribing time-consuming and error prone. As an alternative, therefore, we investigate extraction from several self-supervised neural We use these representations compute word-based differences between non-native native speakers English, Norwegian dialect speakers. For comparison with earlier studies, evaluate how well match human perception them available judgements similarity. show that extracted a specific type (i.e. Transformers) lead better than two approaches on basis MFCC-based features. furthermore find features generally one middle hidden final layer. also demonstrate only segmental differences, but intonational durational cannot adequately represented set discrete symbols

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Pronunciation Variation for Speech Recognition

CERTIFICATE This is to certify that the work contained in this thesis titled Modeling Pronunciation Variation for Speech Recognition submitted by Gopala Krishna Anumanchipalli for the award of the degree of Master of Science (by Research) in Computer Science & Engineering is a bonafide record of research work carried out by him under our supervision. The contents of this thesis, in full or in p...

متن کامل

Modeling Pronunciation Variation for Cantonese Speech Recognition

Due to the large variability of pronunciation in spontaneous speech, pronunciation modeling becomes a more challenging and essential part in speech recognition. In this paper, we describe two different approaches of pronunciation modeling by using decision tree. At lexical level, a pronunciation variation dictionary is built to obtain alternative pronunciations for each word, in which each entr...

متن کامل

Speech Sound Perception and Neural Representations

This commentary reviews some of the main findings in speech sound perception using the brain imaging techniques and comments briefly on the recent findings by the session contributors. The main emphasis is on the experimental settings used in these studies. The aim is to demonstrate how the search for the neural correlates for abstract linguistic units has resulted in various types of experimen...

متن کامل

Modeling Pronunciation Variation in Automatic Speech Recognition

The performance of automatic speech recognition systems varies widely across different contexts. Very good performance can be achieved on single-speaker, large-vocabulary dictation in a clean acoustic environment, as well as on very small vocabulary tasks (such as digit recognition) with fewer constraints on the speakers and acoustic conditions. In other domains, such as meeting transcription o...

متن کامل

Modeling pronunciation variation using artificial neural networks for English spontaneous speech

Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieve...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Phonetics

سال: 2022

ISSN: ['1095-8576', '0095-4470']

DOI: https://doi.org/10.1016/j.wocn.2022.101137